associative memory model
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Storage capacity of perceptron with variable selection
Xu, Yingying, Ohzeki, Masayuki, Kabashima, Yoshiyuki
A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.
- North America > United States (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- (3 more...)
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture
Burns, Thomas F, Fukai, Tomoki, Earls, Christopher J
Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification. We then apply our architecture in small language models with 8 million parameters, focusing on attention head values, with results also indicating improved ICL performance at this larger and more naturalistic scale.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Montserrat (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking
Karakida, Ryo, Ota, Toshihiro, Taki, Masato
Transformers have established themselves as the leading neural network model in natural language processing and are increasingly foundational in various domains. In vision, the MLP-Mixer model has demonstrated competitive performance, suggesting that attention mechanisms might not be indispensable. Inspired by this, recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. However, the theoretical framework for these models remains underdeveloped. This paper proposes a novel perspective by integrating Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the entire Transformer block, encompassing token-/channel-mixing modules, layer normalization, and skip connections, as a single Hopfield network. This approach yields a parallelized MLP-Mixer derived from a three-layer Hopfield network, which naturally incorporates symmetric token-/channel-mixing modules and layer normalization. Empirical studies reveal that symmetric interaction matrices in the model hinder performance in image recognition tasks. Introducing symmetry-breaking effects transitions the performance of the symmetric parallelized MLP-Mixer to that of the vanilla MLP-Mixer. This indicates that during standard training, weight matrices of the vanilla MLP-Mixer spontaneously acquire a symmetry-breaking configuration, enhancing their effectiveness. These findings offer insights into the intrinsic properties of Transformers and MLP-Mixers and their theoretical underpinnings, providing a robust framework for future model design and optimization.
Exploring a Cognitive Architecture for Learning Arithmetic Equations
The acquisition and performance of arithmetic skills and basic operations such as addition, subtraction, multiplication, and division are essential for daily functioning, and reflect complex cognitive processes. This paper explores the cognitive mechanisms powering arithmetic learning, presenting a neurobiologically plausible cognitive architecture that simulates the acquisition of these skills. I implement a number vectorization embedding network and an associative memory model to investigate how an intelligent system can learn and recall arithmetic equations in a manner analogous to the human brain. I perform experiments that provide insights into the generalization capabilities of connectionist models, neurological causes of dyscalculia, and the influence of network architecture on cognitive performance. Through this interdisciplinary investigation, I aim to contribute to ongoing research into the neural correlates of mathematical cognition in intelligent systems.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education > Curriculum > Subject-Specific Education (0.72)
Noise-Enhanced Associative Memories
Recent advances in associative memory design through structured pattern sets and graph-based inference algorithms allow reliable learning and recall of exponential numbers of patterns. Though these designs correct external errors in recall, they assume neurons compute noiselessly, in contrast to highly variable neurons in hippocampus and olfactory cortex. Here we consider associative memories with noisy internal computations and analytically characterize performance. As long as internal noise is less than a specified threshold, error probability in the recall phase can be made exceedingly small. More surprisingly, we show internal noise actually improves performance of the recall phase. Computational experiments lend additional support to our theoretical analysis. This work suggests a functional benefit to noisy neurons in biological neuronal networks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (3 more...)
Associative Memories in the Feature Space
Salvatori, Tommaso, Millidge, Beren, Song, Yuhang, Bogacz, Rafal, Lukasiewicz, Thomas
An autoassociative memory model is a function that, given a set of data points, takes as input an arbitrary vector and outputs the most similar data point from the memorized set. However, popular memory models fail to retrieve images even when the corruption is mild and easy to detect for a human evaluator. This is because similarities are evaluated in the raw pixel space, which does not contain any semantic information about the images. This problem can be easily solved by computing \emph{similarities} in an embedding space instead of the pixel space. We show that an effective way of computing such embeddings is via a network pretrained with a contrastive loss. As the dimension of embedding spaces is often significantly smaller than the pixel space, we also have a faster computation of similarity scores. We test this method on complex datasets such as CIFAR10 and STL10. An additional drawback of current models is the need of storing the whole dataset in the pixel space, which is often extremely large. We relax this condition and propose a class of memory models that only stores low-dimensional semantic embeddings, and uses them to retrieve similar, but not identical, memories. We demonstrate a proof of concept of this method on a simple task on the MNIST dataset.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Bridging Associative Memory and Probabilistic Modeling
Schaeffer, Rylan, Zahedi, Nika, Khona, Mikail, Pai, Dhruv, Truong, Sang, Du, Yilun, Ostrow, Mitchell, Chandra, Sarthak, Carranza, Andres, Fiete, Ila Rani, Gromov, Andrey, Koyejo, Sanmi
Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions. We showcase four examples: First, we propose new energy-based models that flexibly adapt their energy functions to new in-context datasets, an approach we term \textit{in-context learning of energy functions}. Second, we propose two new associative memory models: one that dynamically creates new memories as necessitated by the training data using Bayesian nonparametrics, and another that explicitly computes proportional memory assignments using the evidence lower bound. Third, using tools from associative memory, we analytically and numerically characterize the memory capacity of Gaussian kernel density estimators, a widespread tool in probababilistic modeling. Fourth, we study a widespread implementation choice in transformers -- normalization followed by self attention -- to show it performs clustering on the hypersphere. Altogether, this work urges further exchange of useful ideas between these two continents of artificial intelligence.
- Asia > Middle East > Jordan (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Maryland (0.04)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- (2 more...)
Capacity for Patterns and Sequences in Kanerva's SDM as Compared to Other Associative Memory Models
The information capacity of Kanerva's Sparse, Distributed Memory (SDM) and Hopfield-type neural networks is investigated. Under the approximations used here, it is shown that the to(cid:173) tal information stored in these systems is proportional to the number connections in the net(cid:173) work. The proportionality constant is the same for the SDM and HopJreld-type models in(cid:173) dependent of the particular model, or the order of the model. The approximations are checked numerically. This same analysis can be used to show that the SDM can store se(cid:173) quences of spatiotemporal patterns, and the addition of time-delayed connections allows the retrieval of context dependent temporal patterns.